STATS 32: Introduction to R for Undergraduates

Kenneth Tay

Oct 3, 2017

Agenda for today

The big data explosion



What is R?

Ross Ihaka & Rob Gentleman

Why learn R?

Reason #1: R was specifically designed for statistics and data analysis.

Map of US obesity rates (Source: stackoverflow.com)
Demo of the Central Limit Theorem (Source: yihui.name)

Why learn R?

“What software have you used for Analytics, Data Mining, Data Science and/or Machine Learning projects in the past 12 months?”

(Source: KDnuggets)

Why learn R?

Reason #3: It’s easy to get started with R.

The R Journal

Bi-annual open-access journal: Features short to medium length articles covering topics of interest to R users and developers

R-bloggers

Blog aggregator of content contributed by bloggers who write about R

Examples of posts:

Stack Overflow

Q&A site for programmers

R-exercises

Website with both tutorials and exercises

The challenge of learning R

df[df$mpg > 30,]
with(df, df[mpg > 30,])
subset(df, mpg > 30)
filter(df, mpg > 30)
df %>% filter(mpg > 30)

Course objectives

By the end of this course, students will be able to:

Tentative overview of the course

This course is not…

Class logistics

Class logistics

What is a variable?

x <- 3

Variable types

Confusion: 123 vs. “123”

How to differentiate between numeric variables and character variables which consist of digits?

Let’s try R!